1 Introduction
This is my final for ESS 580A7, Introduction to Environmental Data Science. Each chapter is an assignment from each week of the course.
The beginning of each assignment contains code written by Matthew Ross and Nathan Mueller.
2 Assignment 1: RMarkdown Examples
title: “Discharge Data Example”
date: “Last compiled on March 14, 2022”
output: html_document: toc: true toc_float: true toc_depth: 3
2.1 Methods
The Poudre River at Lincoln Bridge is:
Downstream of only a little bit of urban stormwater
Near Odell Brewing CO
Near an open space area and the Poudre River Trail
Downstream of many agricultural diversions
2.1.1 Site Description

2.2 Data Acquisition and Plotting Tests
2.2.1 Data Download
#Download the Data
q <- readNWISdv(siteNumbers = '06752260',
parameterCd = '00060',
startDate = '2017-01-01',
endDate = '2022-01-01') %>%
rename(q = 'X_00060_00003')2.2.2 Static Data Plotter
#Plot the Data with ggplot
ggplot(q, aes(x = Date, y = q)) +
geom_line() +
ylab('Q (cfs)') +
ggtitle('Discharge in the Poudre River, Fort Collins')
2.2.3 Interactive Data Plotter
#Plot A Dygraph
q_xts <- xts(q$q, order.by = q$Date)
dygraph(q_xts) %>%
dyAxis("y", label = "Discharge (cfs)") 2.3 Assignment
This assignment will be primarily about demonstrating some expertice in using RMarkdown, since we will be using Rmds as the primary form of homework and assignments. With that in mind, your assignment for this homework is to:
2.3.1 Question 1-3:
1: Fork the example repository into your personal GitHub (Completed)
2: Create an RStudio project from your Personal clone of the Repo (Completed)
3: Create a table of contents that is floating, but displays three levels of headers instead of two (by editing the content at the beginning of the document) (Completed)
2.3.2 Question 4:
Make a version of the dygraph with points and lines by using rstudio’s
dygraph guide
#Plot A Dygraph
dygraph(q_xts) %>%
dyOptions(drawPoints = TRUE, pointSize = 2) %>%
dyAxis("y", label = "Discharge (cfs)") 2.3.3 Question 5:
Writing a paragraph on the Poudre river with at least three hyperlinks, two bolded sections, and one italicized phrase. The content of this paragraph is not vital, but try to at least make it true and interesting, and, of course, don’t plagiarize.
The Poudre Itself
The Cache la Poudre River, also known as The Poudre, is a river beloved by Fort Collins residents. Beginning in Rocky Mountain National Park, and ending when it merges with the South Platte River near Greeley, CO, the Poudre spans 76 miles. From start to finish, the Poudre has about a 7,000 feet elevation change.
The Poudre & Recreation
The Poudre river provides Northern Colorado with lots of different types of recreation, including whitewater rafting, fishing, and floating within it. For whitewater rafting, the Poudre brings multiple options for different ability levels. From easy Class I rapids to a Class V waterfall rapid, you can pick and choose from multiple different trips to tailor the experience to your comfort and ability. For more information or to book a trip, check out this rafting site. If white water rafting isn’t your thing, you may enjoy fly fishing. The poudre has a ton of fish, and is well known for its trout-rich waters. So no matter where along the river you decide to stop and drop a line, you’ll find fish, but this guide has a thorough map with locations, and information about rules and regulations. If being in the water doesn’t float your boat, there are many ways to enjoy the Poudre while staying dry, such as hiking, hammocking, biking, and walking along it. The Poudre has carved a beautiful canyon overtime, with many “gulches”. These gulches are particularly beautiful and make for great hikes. Once you’ve done a hike and you need a break, the Poudre also has numerous picnic spots along it that make for great hammocking. Whether you embark on a great adventure along the Poudre, or just head up the canyon for a relaxing drive, you’ll have a good day.
2.3.4 Question 6-9:
6: Knit that document, and then git commit and push to your personal GitHub. (Completed)
7: Use the GitHub -> Settings -> Pages tab to create a website of your report. (Completed)
8: Bonus, make the timestamp in the header dynamic. As in it only adds todays date, not just a static date you enter. (Completed)
9: Bonus, create an “index_talk.Rmd” version of your document using the
revealjs package. Add link to your original report-style document. (Completed)
3 Assignment 2: Fire Data Wrangle
title: “Hayman Fire Recovery” author: “Samantha Clark” date: “10/3/2019” output: html_document: default pdf_document: default
# Prepare Libraries Needed
library(tidyverse)
library(tidyr)
library(ggthemes)
library(lubridate)
library(knitr)
# Now that we have learned how to munge (manipulate) data
# and plot it, we will work on using these skills in new ways
knitr::opts_knit$set(root.dir='.')####-----Reading in Data and Stacking it ----- ####
#Reading in files
files <- list.files('./data',full.names=T)
#Read in individual data files
ndmi <- read_csv(files[1]) %>%
rename(burned=2,unburned=3) %>%
mutate(data='ndmi')
ndsi <- read_csv(files[2]) %>%
rename(burned=2,unburned=3) %>%
mutate(data='ndsi')
ndvi <- read_csv(files[3])%>%
rename(burned=2,unburned=3) %>%
mutate(data='ndvi')
# Stack as a tidy dataset
full_long <- rbind(ndvi,ndmi,ndsi) %>%
gather(key='site',value='value',-DateTime,-data) %>%
filter(!is.na(value))3.1 Question 1
What is the correlation between NDVI and NDMI? - here I want you to convert the full_long dataset in to a wide dataset using the function “spread” and then make a plot that shows the correlation as a function of if the site was burned or not (x axis should be ndmi) You should exclude winter months and focus on summer months
# Convert data from long format to wide format
full_wide <- pivot_wider(full_long, names_from = "data", values_from = "value")
# Select months and plot NDVI vs NDMI
full_wide %>%
mutate(month=month(DateTime), year=year(DateTime)) %>%
filter(!month %in% c(9,10,11,12,1,2,3,4,5)) %>%
ggplot(aes(x=ndmi, y=ndvi,color=site)) +geom_point() +geom_smooth() + ggtitle('Figure 1: NDVI vs NDMI') +xlab('NDMI') + ylab('NDVI')## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 11 rows containing non-finite values (stat_smooth).
## Warning: Removed 11 rows containing missing values (geom_point).

Overall NDMI and NDVI are positively correlated, so as moisture increases, so does vegetation. However, at a certain point, when moisture becomes too high, vegetation no longer increases and either plateaus or decreases. This is understandable for instances of snow, or if vegetation has a very specific range of moisture in which it thrives. If the NDMI is above the vegetation’s upper moisture limit, the vegetation may begin to suffer, and therefore NDVI would decrease. There is also a difference between the trend for the unburned site and the burned site. Overall, vegetation is lower for the burned site no matter the moisture level. This makes sense as the burned site has fire damage, which will impact the vegetation for years after the fire. Overall, both of the trends are positively correlated to about the same point.
3.2 Question 2
What is the correlation between average NDSI (normalized snow index) for January - April and average NDVI for June-August? In other words, does the previous year’s snow cover influence vegetation growth for the following summer?
# Convert NDVI dataset
ndvi_long <- pivot_longer(ndvi, cols = c(burned, unburned), names_to = "site", values_to = "value") %>%
filter(!is.na(value)) %>%
transform(value = as.numeric(value))
#find averages for each year, for June- August
NDVI_avgs <- ndvi_long %>%
mutate(month=month(DateTime), year=year(DateTime)) %>%
filter(month %in% c(6,7,8)) %>%
group_by(year, data) %>%
summarize(mean = mean(value))## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# Convert NDSI dataset
ndsi_long <- pivot_longer(ndsi, cols = c(burned, unburned), names_to = "site", values_to = "value") %>%
filter(!is.na(value)) %>%
transform(value = as.numeric(value))
#find averages for each year, for January-April
NDSI_avgs <- ndsi_long %>%
mutate(month=month(DateTime), year=year(DateTime)) %>%
filter(month %in% c(1,2,3,4)) %>%
group_by(year, data) %>%
summarize(mean = mean(value))## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
#merge datasets
finaldf = rbind(NDSI_avgs, NDVI_avgs)
#plot
ggplot(finaldf) +aes(x=year, y=mean, color=data) +geom_point() +geom_line() + ggtitle('Figure 2: Snow Effect on Vegetation') +xlab('Year') +ylab('Mean')
It appears that there is not a trend between snow fall in January-April and vegetation in June-August. The range in snow index does not have a large effect on vegetation index. For example, the large drop in average snow index in 1988 does not have an impact on the vegetation. This is an interesting lack of relationship, as I would expect the amount of snow to have a large impact on the greenness for the upcoming season.
3.3 Question 3
How is the snow effect from question 2 different between pre- and post-burn and burned and unburned?
#find averages for each year, for January-April
NDSI_avgs2 <- ndsi_long %>%
mutate(month=month(DateTime), year=year(DateTime)) %>%
filter(month %in% c(1,2,3,4)) %>%
group_by(year, data, site) %>%
summarize(mean = mean(value))## `summarise()` has grouped output by 'year', 'data'. You can override using the `.groups` argument.
#find averages for each year, for June- August
NDVI_avgs2 <- ndvi_long %>%
mutate(month=month(DateTime), year=year(DateTime)) %>%
filter(month %in% c(6,7,8)) %>%
group_by(year, data, site) %>%
summarize(mean = mean(value))## `summarise()` has grouped output by 'year', 'data'. You can override using the `.groups` argument.
#merge datasets
finaldf2 = rbind(NDSI_avgs2, NDVI_avgs2)
#plot
ggplot(finaldf2) +aes(x=year, y=mean, color=data, shape = site) +geom_point() +geom_line() + ggtitle('Figure 3: Snow Effect on Vegetation At Two Sites Before and After 2003 Fire') +xlab('Year') +ylab('Mean')
The fire occurred in 2003, and before the fire there is not a large difference between the NDVI for the (future) burned vs unburned areas. After the fire, there is a large difference between NDVI for burned vs unburned, which is to be expected as fires can impact vegetation for decades after the fire actually happens. Post-burn, it appears that NDSI has a larger impact on NDVI than pre-burn. This would make sense, as vegetation is likely more succeptible to its surroundings and less able to adapt.
3.4 Question 4
What month is the greenest month on average?
#create month column
monthndvi <- ndvi_long %>%
mutate(month=month(DateTime))
#find means by month
monthlymeans_NDVI <- monthndvi %>%
group_by(month) %>%
summarize(mean = mean(value))
#create a table
kable(monthlymeans_NDVI, caption = 'NDVI Averages by Month')| month | mean |
|---|---|
| 1 | 0.2186012 |
| 2 | 0.1986539 |
| 3 | 0.2019000 |
| 4 | 0.2501716 |
| 5 | 0.3036180 |
| 6 | 0.3512811 |
| 7 | 0.3633104 |
| 8 | 0.3871568 |
| 9 | 0.3827147 |
| 10 | 0.3570475 |
| 11 | 0.2852874 |
| 12 | 0.2513979 |
NDVI is the normalized difference vegetation index, and determines the density of green on land. As such, I can find the average NDVI of different months and determine which month, on average, is the greenest. The averages range from 0.1986 (February) to 0.3871 (August). In other words, August was the greenest month on average. The second greenest was September at 0.3827. The least greenest month was February at 0.1986.
3.5 Question 5
What month is the snowiest on average?
#create a month column
monthndsi <- ndsi_long %>%
mutate(month=month(DateTime))
#find means by month
monthlymeans_NDSI <- monthndsi %>%
group_by(month) %>%
summarize(mean = mean(value))
#create a table
kable(monthlymeans_NDSI, caption = 'NDSI Averages by Month')| month | mean |
|---|---|
| 1 | 0.2099584 |
| 2 | 0.1988183 |
| 3 | -0.0420229 |
| 4 | -0.2278234 |
| 5 | -0.4105171 |
| 6 | -0.4409882 |
| 7 | -0.4504829 |
| 8 | -0.4594660 |
| 9 | -0.4539411 |
| 10 | -0.4085219 |
| 11 | -0.1151240 |
| 12 | 0.1241000 |
NDSI is the normalized difference snow index, and detects the presence of snow on land. As such, I can find the average NDSI of different months and determine which month, on average, is the snowiest The averages range from -0.4594 (August) to 0.2099 (January). In other words, January was the snowiest month on average. The second snowiest was February at 0.1988. The least snowiest month was August at -0.4594.
3.6 Bonus Question: Redo all problems with spread and gather using modern tidyverse syntax.(Completed)
4 Assignment 3: Snow Functions Iteration
title: “Snow Data Assignment: Web Scraping, Functions, and Iteration” author: “Samantha Clark” date: “2-7-2022” output: html_document
4.1 Simple web scraping
R can read html using either rvest, xml, or xml2 packages. Here we are going to navigate to the Center for Snow and Avalance Studies Website and read a table in. This table contains links to data we want to programatically download for three sites. We don’t know much about these sites, but they contain incredibly rich snow, temperature, and precip data.
4.1.1 Reading an html
4.1.1.1 Extract CSV links from webpage
site_url <- 'https://snowstudies.org/archived-data/'
#Read the web url
webpage <- read_html(site_url)
#See if we can extract tables and get the data that way
tables <- webpage %>%
html_nodes('table') %>%
magrittr::extract2(3) %>%
html_table(fill = TRUE)
#That didn't work, so let's try a different approach
#Extract only weblinks and then the URLs!
links <- webpage %>%
html_nodes('a') %>%
.[grepl('24hr',.)] %>%
html_attr('href')4.1.2 Data Download
4.1.2.1 Download data in a for loop
#Grab only the name of the file by splitting out on forward slashes
splits <- str_split_fixed(links,'/',8)
#Keep only the 8th column
dataset <- splits[,8]
#generate a file list for where the data goes
file_names <- paste0('data/',dataset)
for(i in 1:3){
download.file(links[i],destfile=file_names[i])
}
downloaded <- file.exists(file_names)
evaluate <- !all(downloaded)4.1.2.2 Download data in a map
#Map version of the same for loop (downloading 3 files)
if(evaluate == T){
map2(links[1:3],file_names[1:3],download.file)
}else{print('data already downloaded')}## [[1]]
## [1] 0
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0
4.1.3 Data read-in
4.1.3.1 Read in just the snow data as a loop
#Pattern matching to only keep certain files
snow_files <- file_names %>%
.[!grepl('SG_24',.)] %>%
.[!grepl('PTSP',.)]
#empty_data <- list()
# snow_data <- for(i in 1:length(snow_files)){
# empty_data[[i]] <- read_csv(snow_files[i]) %>%
# select(Year,DOY,Sno_Height_M)
# }
#snow_data_full <- do.call('rbind',empty_data)
#summary(snow_data_full)4.1.3.2 Read in the data as a map function
#Create a map function
our_snow_reader <- function(file){
name = str_split_fixed(file,'/',2)[,2] %>%
gsub('_24hr.csv','',.)
df <- read_csv(file) %>%
select(Year,DOY,Sno_Height_M) %>%
mutate(site = name)
}
#read in the data with function
snow_data_full <- map_dfr(snow_files,our_snow_reader)## Rows: 6211 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (52): ArrayID, Year, DOY, Hour, LoAir_Min_C, LoAir_Min_Time, LoAir_Max_C...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 6575 Columns: 48
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (48): ArrayID, Year, DOY, Hour, LoAir_Min_C, LoAir_Min_Time, LoAir_Max_C...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(snow_data_full)## Year DOY Sno_Height_M site
## Min. :2003 Min. : 1.0 Min. :-3.523 Length:12786
## 1st Qu.:2008 1st Qu.: 92.0 1st Qu.: 0.350 Class :character
## Median :2012 Median :183.0 Median : 0.978 Mode :character
## Mean :2012 Mean :183.1 Mean : 0.981
## 3rd Qu.:2016 3rd Qu.:274.0 3rd Qu.: 1.520
## Max. :2021 Max. :366.0 Max. : 2.905
## NA's :4554
4.1.3.3 Plot snow data
#create a new data set from full data (group data and get means)
snow_yearly <- snow_data_full %>%
group_by(Year,site) %>%
summarize(mean_height = mean(Sno_Height_M,na.rm=T))## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
#plot the new data set
ggplot(snow_yearly,aes(x=Year,y=mean_height,color=site)) +
geom_point() +
ggthemes::theme_few() +
ggthemes::scale_color_few()## Warning: Removed 2 rows containing missing values (geom_point).

4.2 Assignment:
- Extract the meteorological data URLs. Here we want you to use the
rvestpackage to get the URLs for theSASP forcingandSBSP_forcingmeteorological datasets.
library(rvest)
site_url1 <- 'https://snowstudies.org/archived-data/'
#Read the web url
webpage1 <- read_html(site_url1)
#extract weblinks and urls
hwlinks <- webpage1 %>%
html_nodes('a') %>%
.[grepl('forcing',.)] %>%
html_attr('href')
hwlinks## [1] "https://snowstudies.org/wp-content/uploads/2022/02/SBB_SASP_Forcing_Data.txt"
## [2] "https://snowstudies.org/wp-content/uploads/2022/02/SBB_SBSP_Forcing_Data.txt"
- Download the meteorological data. Use the
download_fileandstr_split_fixedcommands to download the data and save it in your data folder. You can use a for loop or a map function.
#Download data
#Grab only the name of the file by splitting out on forward slashes
hwsplit <- str_split_fixed(hwlinks,'/',8)
#Keep only the 7th column
hwdataset <- hwsplit[,8]
#generate a file list for where the data goes
hwfilenames <- paste0('data/',hwdataset)
for(i in 1:2){
download.file(hwlinks[i],destfile=hwfilenames[i])
}
hwdownloaded <- file.exists(hwfilenames)
hwevaluate <- !all(hwdownloaded)- Write a custom function to read in the data and append a site column to the data.
# Grab the variable names from the pdf
library(pdftools)## Using poppler version 20.12.1
headers <- pdf_text('https://snowstudies.org/wp-content/uploads/2022/02/Serially-Complete-Metadata-text08.pdf') %>%
readr::read_lines(.) %>%
trimws(.) %>%
str_split_fixed(.,'\\.',2) %>%
.[,2] %>%
.[1:26] %>%
str_trim(side = "left")
#Read in data
forcing_reader <- function(hwfilenames){
hwdf <- read_fwf(hwfilenames)
names(hwdf) = headers[1:26]
hdr1 = str_split_fixed(hwfilenames,'_', 3)[, 2]
mutate(hwdf, site=hdr1)
}- Use the
mapfunction to read in both meteorological files. Display a summary of your tibble.
#use function to read in files
forcing_data_full <- map_dfr(hwfilenames,forcing_reader)## Rows: 69168 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
##
## chr (2): X12, X14
## dbl (17): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X13, X15, X16, X17, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: The `value` argument of `names<-` must have the same length as `x` as of tibble 3.0.0.
## `names` must have length 19, not 26.
## Rows: 69168 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
##
## chr (2): X12, X14
## dbl (17): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X13, X15, X16, X17, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: The `value` argument of `names<-` must have the same length as `x` as of tibble 3.0.0.
## `names` must have length 19, not 26.
#show summary tibble
summary(forcing_data_full)## year month day hour minute
## Min. :2003 Min. : 1.000 Min. : 1.00 Min. : 0.00 Min. :0
## 1st Qu.:2005 1st Qu.: 3.000 1st Qu.: 8.00 1st Qu.: 5.75 1st Qu.:0
## Median :2007 Median : 6.000 Median :16.00 Median :11.50 Median :0
## Mean :2007 Mean : 6.472 Mean :15.76 Mean :11.50 Mean :0
## 3rd Qu.:2009 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:17.25 3rd Qu.:0
## Max. :2011 Max. :12.000 Max. :31.00 Max. :23.00 Max. :0
## second precip [kg m-2 s-1] sw down [W m-2] lw down [W m-2]
## Min. :0 Min. :0.000e+00 Min. :-9999.000 Min. :-9999.0
## 1st Qu.:0 1st Qu.:0.000e+00 1st Qu.: -3.510 1st Qu.: 173.4
## Median :0 Median :0.000e+00 Median : -0.344 Median : 231.4
## Mean :0 Mean :3.838e-05 Mean :-1351.008 Mean :-1325.7
## 3rd Qu.:0 3rd Qu.:0.000e+00 3rd Qu.: 294.900 3rd Qu.: 272.2
## Max. :0 Max. :6.111e-03 Max. : 1341.000 Max. : 365.8
## air temp [K] windspeed [m s-1] relative humidity [%] pressure [Pa]
## Min. :242.1 Min. :-9999.000 Length:138336 Min. :63931
## 1st Qu.:265.8 1st Qu.: 0.852 Class :character 1st Qu.:63931
## Median :272.6 Median : 1.548 Mode :character Median :65397
## Mean :272.6 Mean : -790.054 Mean :65397
## 3rd Qu.:279.7 3rd Qu.: 3.087 3rd Qu.:66863
## Max. :295.8 Max. : 317.300 Max. :66863
## specific humidity [g g-1] calculated dewpoint temperature [K]
## Length:138336 Min. : 0.0
## Class :character 1st Qu.: 0.0
## Mode :character Median : 0.0
## Mean : 387.3
## 3rd Qu.: 0.0
## Max. :2009.0
## precip, WMO-corrected [kg m-2 s-1]
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 0.0
## Mean : 466.2
## 3rd Qu.: 0.0
## Max. :3002.0
## air temp, corrected with Kent et al. (1993) [K]
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 5.754
## 3rd Qu.: 0.000
## Max. :5002.000
## air temp, corrected with Anderson and Baumgartner (1998)[K]
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 0.0
## Mean : 654.3
## 3rd Qu.: 0.0
## Max. :4009.0
## air temp, corrected with Nakamura and Mahrt (2005) [K] site
## Min. : 0.0 Length:138336
## 1st Qu.: 0.0 Class :character
## Median : 0.0 Mode :character
## Mean : 155.4
## 3rd Qu.: 0.0
## Max. :6009.0
- Make a line plot of mean temp by year by site (using the
air temp [K]variable). Is there anything suspicious in the plot? Adjust your filtering if needed.
# prep data, find means
forcing_yearly <- forcing_data_full %>%
group_by(year, site) %>%
filter(year %in% c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011)) %>%
summarize(
mean = mean(`air temp [K]`)
)## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# plot
ggplot(forcing_yearly, (aes(x=year, y=mean, color=site))) +geom_line()
On average, air temperature has increased over time from 2004 to 2011. The SBSP site’s air temperature is on average, always a few degrees cooler than the SASP site.
2003 was filtered out due to containing a small amount of data only in winter, leading to a very cold mean, which is not representative.
- Write a function that makes line plots of monthly average temperature at each site for a given year. Use a for loop to make these plots for 2005 to 2010. Are monthly average temperatures at the Senator Beck Study Plot ever warmer than the Snow Angel Study Plot? Hint: https://ggplot2.tidyverse.org/reference/print.ggplot.html
# write a function to plot
lineplotter <- function(df,year){
temp_month <- df %>%
group_by(year, month, site) %>%
summarize(
meantemp = mean(`air temp [K]`, na.rm = T)) %>%
filter (yr == year)
linegraph <-
ggplot(temp_month, aes(x= month, y= meantemp, color=site)) + geom_line() + labs(x = 'Month', y= 'Average Air Temperature [K]', title = yr)
print(linegraph)
}
# create a list of years to plot
years <- c(2005, 2006, 2007, 2008, 2009, 2010)
#create a for loop
for (yr in years){
lineplotter(forcing_data_full, year)
}## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

## `summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.

Each year’s average air temperature varies in a similar pattern, starting low in January/February, increasing until peaking around August and then decreasing again through the end of the year. This makes sense, as temperatures are lower in winter (beginning and end of the year) and higher in summer (especially July/August). The SBSP site’s average air temperature is consistently lower than the SASP site.
Bonus: Make a plot of average daily precipitation by day of year (averaged across all available years). Color each site.
# prep data and find means
dailyprecip <- forcing_data_full %>%
group_by(month, day) %>%
summarize(
meandailyprecip = mean(`precip [kg m-2 s-1]`)) ## `summarise()` has grouped output by 'month'. You can override using the `.groups` argument.
# here I tried to create a month/day combo column to be able to plot by
dailyprecip_dates <- dailyprecip %>%
unite("DM", day:month, remove = FALSE)
# plot
ggplot(dailyprecip_dates, aes(x=DM, y=meandailyprecip)) + geom_point()
Bonus #2: Use a function and for loop to create yearly plots of precipitation by day of year. Color each site.
# edit data to have a date column
forcing_data_full$Date <- as.Date(with(forcing_data_full,paste(year,month,day,sep="-")),"%Y-%m-%d")# write a function to plot
precipplotter <- function(df,year){
precip_days <- df %>%
group_by(month, day, year, site)%>%
filter (yr == year)
precipgraph <-
ggplot(precip_days, aes(x= Date, y= `precip [kg m-2 s-1]`)) + geom_line() + labs(x = 'Date', y= 'Precip', title = yr)
print(precipgraph)
}
# create a list of years to plot
years <- c(2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011)
# create a for in loop
for (yr in years){
precipplotter(forcing_data_full, year)
}








5 Assignment 4: LAGOS Spatial Analyses 1
title: “LAGOS Spatial Analysis” author: “Matthew Ross, completed by Samantha Clark” date: “2/23/2022” output: html_document editor_options: chunk_output_type: console
5.1 LAGOS Analysis
5.1.1 Loading in data
#install.packages(c("RApiSerialize", "LAGOSNE", 'USAboundaries'))
#LAGOSNE::lagosne_get(dest_folder = LAGOSNE:::lagos_path())5.1.1.1 First download and then specifically grab the locus (or site lat longs)
# #Lagos download script
#LAGOSNE::lagosne_get(dest_folder = LAGOSNE:::lagos_path())
#Load in lagos
lagos <- lagosne_load()## Warning in (function (version = NULL, fpath = NA) : LAGOSNE version unspecified,
## loading version: 1.087.3
#Grab the lake centroid info
lake_centers <- lagos$locus5.1.1.2 Convert to spatial data
#Look at the column names
#names(lake_centers)
#Look at the structure
#str(lake_centers)
#View the full dataset
#View(lake_centers %>% slice(1:100))
spatial_lakes <- st_as_sf(lake_centers,coords=c('nhd_long','nhd_lat'),
crs=4326) %>%
st_transform(2163)
#Subset for plotting
subset_spatial <- spatial_lakes %>%
slice(1:100)
subset_baser <- spatial_lakes[1:100,]
#Dynamic mapviewer
mapview(subset_spatial)5.1.1.3 Subset to only Minnesota
states <- us_states()
#Plot all the states to check if they loaded
#mapview(states)
minnesota <- states %>%
filter(name == 'Minnesota') %>%
st_transform(2163)
#Subset lakes based on spatial position
minnesota_lakes <- spatial_lakes[minnesota,]
#Plotting the first 1000 lakes
minnesota_lakes %>%
arrange(-lake_area_ha) %>%
slice(1:1000) %>%
st_transform(4326) %>%
mapview()mapviewOptions(fgb = F)5.2 In-Class work
5.2.1 1) Show a map outline of Iowa and Illinois (similar to Minnesota map upstream)
#subset to Iowa and Illinois
IAandIL <- states %>%
filter(name %in% c('Iowa', 'Illinois')) %>%
st_transform(2163)
#create a map
mapview(IAandIL)5.2.2 2) Subset LAGOS data to these sites, how many sites are in Illinois and Iowa combined? How does this compare to Minnesota?
#Subset lakes based on spatial position
IAandIL_lakes <- spatial_lakes[IAandIL,]
nrow(IAandIL_lakes)## [1] 16466
nrow(minnesota_lakes)## [1] 29038
Illinois and Iowa have 16,466 sites. Minnesota has 29,038 sites. This is nearly double the number of sites, compared to Illinois and Iowa combined.
5.2.3 3) What is the distribution of lake size in Iowa vs. Minnesota?
- Here I want to see a histogram plot with lake size on x-axis and frequency on y axis (check out geom_histogram)
- make histogram log
# subset to just Iowa data
iowa <- states %>%
filter(name == 'Iowa') %>%
st_transform(2163)
# create spatial lake data for Iowa
iowa_lakes <- spatial_lakes[iowa,]
iowa_lakes$state <- 'Iowa'
# get a lakes set for Minnesota
minnesota_lakes$state <- 'Minnesota'
#Combine Iowa and Minnesota Data
IAandMN_lakes <- bind_rows(iowa_lakes, minnesota_lakes)
names(IAandMN_lakes)## [1] "lagoslakeid" "nhdid" "gnis_name"
## [4] "lake_area_ha" "lake_perim_meters" "nhd_fcode"
## [7] "nhd_ftype" "iws_zoneid" "hu4_zoneid"
## [10] "hu6_zoneid" "hu8_zoneid" "hu12_zoneid"
## [13] "edu_zoneid" "county_zoneid" "state_zoneid"
## [16] "elevation_m" "state" "geometry"
#Plot the lakes
ggplot(IAandMN_lakes) + aes(x=lake_area_ha, fill = state) +geom_histogram()+ scale_x_log10() + ylab('Frequency') +xlab('Lake Size in Hectares') + facet_wrap(vars(state))## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are more lakes in Minnesota than in Iowa, but the distribution of lakes is similar. For both states there are a greater number of smaller lakes, and a smaller number of large lakes.
5.2.4 4) Make an interactive plot of lakes in Iowa and Illinois and color them by lake area in hectares
# Use mapview to create an interactive plot of lakes
mapview(IAandIL_lakes, zcol = "lake_area_ha", at = c(0, 5, 10, 100, 250, 500, 750, 1000, 5000, 10000))